^666 



ED 4.18 

&0THOB 
TITLE 



IHSTITDTIOr 

spons agency 

bepobt. no 
pub' date. 

NOTE 

EDES PRICE 
DESCRIPTORS 



IDENTIFIERS 



D0C01I9NT RESUME 



'V 



TH 005 093 



r 



Pritchard, Robert D*;^JLnd others , ^ 

Dev^elopment and Evaluaitpn of an Objective Taclvnique 
to Assess Effort in Training. Tinal, .Report • 
Institute for Organizational Behavior Research, 
Xafayette,' Ind. ^ • 

Air Forc^ Human Resources Lab»> Lojrjry AFB, Colo. 
Technical Training Dxv. 
AFHRl-TR-:75-39 
Oct 75 
51p. 



MF-$0.83 HC-$3.50 Plus Postage 

♦Ability;' Comparative Analysis; *lncentive Systems; 
\ *Heasurejoent; *Hilitary. Training; *Hotiv^tibn*; 
•Objective Tests; Performance Tests ;^ Predictive , 
Ability (Testing) ; Technical Education; Test 
Validity ^ ^ • ^* * 

Air Folrce; *Effort Measurement • 



ABSTRACT . ' - \ 

This research explored the validation^ of a 
quantifiable, objective, dnd reliable method of measuring the amount 
of effort to Sbe directl^ rewarded in incentive systems, A batte^ry of 
relevant albility tests was given to a samole of Air Force trainees 
^nd to civilian subjects using a simulataK^of the course^taught the 
JOTce trainees* Results showed that simulation subjects were 
comparable to the Air Force subjects and that the 'ability test 
battery priedictpd performance equally well fox both samples. The hard 
criterion of effort displayed wide variability, excellent^- 
reliability, and good donstruct validity* (Author) ' . 




* Docume'nts acquired by ERIC include many informal unpublished 

* materials not available from other sources.- ERIC makes every effort * 
to obiiain the best copy available* Nevertheless, items, of marginal * 

* repfl:oducibilits(*'are 6ften endountered and 'this eSf^fects the quality. * 

* of tte- microf icfhe and hardcopy reproductions ERIC makes available ^ * 

* via the ERIC Document Reproduction S^vice (EDRS) . EDR.S'is not * 

* responsible for the quality of the original document. Reproductions * 

* supplied by EDRS are -the best that can be made from the original. '* 



AFHRL-TR-75-39 



il< FORCE S 




U.S. DEI^AIITMBNT OP HEALTH. 
■ DURATION ft WCLPARi 
NATIONAL INSTITUTE OP 
EDUCATION * 
THIS DOCUMENT HAS SEEN REPRO 
OUCEO EXACTLY AS RECEIVED F%OM 
THE PERSON bR ORGANIZATION ORIGIN- 
ATiNG IT POINTS OF VIEW OR OPINIONS 
STATED 00 NQT NECESSARILY REPRE 
SENT OFFICII NATIONAL INSTITUTE OF 
EDUCATION POSITION OR POLlCY( 



CO 



in 



DEVELOPMENT An6 EVALUATION OF AN OBJECTIVE 
* TECHNIQUE TO ASSESS EFFORT IN TRAINING ' 



By 

Robert D. Pritchard 
John H. Hollenback 



Inctituta for Oryanizational B«h«vior RattErch 
Lafayatta. Indiana 47901* 



Philip J. DeLeo 
TECHNICAkTRAINING DIVISION 



Lowry Air Force Base, Colorado 80230 



October 1975 
Final Raport for Pariod Novambar 1973 April 1975 




AIR FORCE SYSTEMS COMMAND 

BROOKS AITl FORCE frASEJEXAS 78235 



NOTICE 



When US Government drawings, specifications, or other data are used 
-for - any purpose other than a definitely related Government 
pro^curement operation, the Govenunent^ thereby ^incurs no 
responsibility nor any obligation whatsoever, and the fact tpt the 
Govemmen* may have formulated, furnished, or in any way supplied 
the said drawings, specifications, or other data is^not to be regarded by 
implication* or otherwise, as in any maimer licensing tlie holder or^any 
other person or corporation, or conveying any rights or permission to 
manufacture, »se, or sell any patented invention that may In any way 
*T)e related thereto. 

<^ 

This final report was submitted by the Institute for Organizational 
* Behavior Research, Lafayette, Indiana 47901, under cqntract, 
F41609-744:-0010, project 1 141, with Jechnical Training DivBion, Air 
Force Hunfian Rcsoutces Laboratory (AFSC), Lowry Air Force Base, 
Colorado 80230. Major Philip J. DeLco, Instructional Technology 
Branch, was the contract monitor. - ' 

This^eport has been reviewed and cleared for open publfcdtion and/or 
public rel^qse b>^ the appropriate Dffice of ^Information (01) in 
accordance with AFR 190-17 andDoDD 5230.9. There is no objection 
to unlimited distribution of -this report to tHe public at large, or by 
DDC to the National'Teehnical Information Service (NTIS). 

This technical^report Has-been reviewed "and is approved. 

MARTY-R. ROCKWAY, Technical Director ' 
Tedinical Training Division * 

.J • ' • 

Approved for publication. ^ ' 

HAROLD E. FISCHER, Colonel, USAF . 
Commander 



Unclassified 



f 



1 



ERLC 



SECURITY CLASSIFICATION OF TM\S PAGE (Wtimn umt^ unimrmaj 

REPORT DOCUMENTATIOK PAGE 


READ INSTRUCTIONS 
BEf"ORE COMPLETING FORM 


"1. REPORT NUMBER . 2- GOVT ACCESSION NO. 

AFHRL-TR-75-39 


3. RECIPIENT'S CATALOG NUMBER 


^ ^ — 

4. TITLE (mnd Subttttm) 

DEVELOPMENT AND EVALUATION OF AN OBJECTIVE 
Trr^Miniic TH Accccc FPFflBT TN TRAINTNR 

1 * * 


5. TYPE OF REPORT * PERIOD COVERED 

Final " - 
November 1973 - April 1975 


6. PERFORMING ORG. REPORT-NUMBER 


7. AUTHORC«; . J 

' Robert D. Pntchard 
John H. Hollenback 
Philip J. DeLeo 


B. CONTRACT OR GRANT NUMBERf»J 

F41609-74-C-0010 


9. PERFORMING ORGANIZATION NAME AND ADDRESS 

Institute tOrJ»fQoi*^\Zai^\\JT\ai l Dcnav lUl rxcocuj 

Lafaye)te, Indiana 47901 . ^.s 


10. PROGRAM ELEMEN-T, PROJECT, TASK 
AREA a WORK UNIT NUMBERS 

61102F 
11410103 


11. CONTROLLING Oi^FICE NAME AND ADDRESS 

Hq Air Force Human Resources Laooratory v^^ol/ 
Brooks Air Force Base, Texas 78235 


12, REPORT DATE 

October 1975 


13. NUMBEPfOF PAGES 


T4~M(JNIT0RING AGENCY NAME A ADDRESSfl/ di//«r«nf trom Controlttng Ottlcm) 

Technical Training Divisipn 

Air Forc^ Human Re^urces Laboratory 

Lowry Ait Force* Base-, Colorado 80230 • ^ 


15, ^CURITY CLASS, (oi thim rmport) 

I 

UnclassifTed 


■ 15«, DECLASSIFICATION/DOWNGRADINO 
SCHEDULE 


Ifi. DISTRIBUTION STATEMENT ro/ R»porO „ " ^ 

. . J /? ^c^^ w«i^ionco* H n c -fv^ "i hiii* i nn unlimited 
Approved for public release, aisiriDULiun uiuimiwc 


17. DISTRIBUTION STATEMENT (of th. .b.tr.ct •nl.r.d In Block 20, U dUfr^t trom Report) ^ 

) 

r . 


18. SUPPLEMENTARY NOTES 


19. KEY WORDS (Contlnum on r.y.r.. midm U nmcmmmMry and IdmntUy by block numbmr) 

technica-l training 
incentive systems 
• student performance • , 

^ effort measurement . ; / ' 


"hrs'^eleV^e^^^^ objective, and reliable 
meJLd of ri^^asuHng the amount of effort to be <iirectly rewarded in incentive 
• system/ A battery of r'elevant abili'ty tests- was given to a sample of Air Force 
traineXand to civilian subjects using a simulation of the course taught the 
Mr ?d?ce trainee Results showed that the simulation subjects were comparable 
^J- the Air Force subjects and that the ability test battery predicted performance 
equally well for both samples. Jhe hard criterion of effort displayed wide 
'variability, excellent reliability, and good construct- val idity. 



DD , :iZ2 1473 



EDITION OF 1- NOV fiS It OBSOLE,TE 

• 4^ 



Unclassified 



SECURITY CLASSIFICATION OF THjS>PAOE fH7i#n D^tm isntmrmd) 



) 



i 



SUMMARY 

Pro blem 



) xem ' . . • J 4 

TWe broblem addressed by this research concerned how to award in- 
centives for achievement in training.' A diffi-culty y±th '"f^J 'incentive 
systems is that they provide the greatest payoff to high ability stifients. 
In fact, in traditional incentive systems, lower ability students may*, 
not get rewarded at' all, regardless of how hard they try. The present 
study was part of a program to tailor incentives to the capabilities of 
eacit individual student. The maj6r purpose, therefore, was to develop 
and evaluate a method of objectively measuring the effort exerted by a 
sftident in a technical training context.- One particularly desirable 
char^acteristic of such a measure would.be its potential usefulness in 
an effort-based incentive sys-tem. • >■ 

^^^'°?he literature concerning physiological," rating, behavioral, and 
computational techniques for measuring effort was reviewed. As demon- 
strated by this review, many of the traditional measures have serious 
limitations. " The approach taken in this study was based on the assump- 
tion that performance is largely a function of ability dnd motivation 
(effort). A logical consequence of this assumption' is that.d measure 
of effort can be obtained by partialling out the influent of ability 
on performance. Thus, a residual score derived in this maiVner would 
reflect what level of effort an individual was exet;ting. 

For the purpose of evaluating this derived measure of effort, a" 
8 1/2 hour section of - an Air Force technical training course was selected 
for study. Thig section was taken from the Aircraft Electrical Repair- > 
man Course (3ABR42330), conducted at ChanuteAKB, Illinois. Following ; 
^an analysis^ of the course material, a battery of relevant ability 
' tests was given to a sample bf Air Force trainees in the target course. 
Regression equations predicting performance on the course materi'al wtere • 
then developed and cross validated. ' ^ > a h„j 

Using.civilian, subjects whose personal characteristics approximated ^ 
the Air Fprce population, a simulation of the selected course was 'developed. 
These subjects took the same ability tests and woVkedNpn the saHje materials . 
as did the Air Force subjects. To assure variability in effott, three 
. pay systems were used - hourly, piece-rate, ^nd variable ratio /variable .. 
amount; A^second set of regression equations predicting performance was ^ 
developed and cross validated for the simuliltion sample.^ ^# 
^ Derived effort scores were then calculated for the simulation 

subjects using both the Air Force generated weights and self-generated ^ 
weights by subtracting from actual performance the performance level 
predicted on the basis of ability. Finally, the derived effort scored ^ 
of the simulation subjectrf^were correlated with a hard criterion of 
effort based on a photographPlc record. _ . 



Results 



R^sult^s of the study showe4 that, the ability test battery predicted 
• performancS equaUy Veil f^r both samples. The hard criterion of effort 
displayed wide variab^ility. excellent reliability, and good construct 

• l- 



validity. . 'The derived effort measure showed moderate' correlations with* 
actuTa^ effort. When the analyses were done with high and low ability 
subjects separately, the correlations were larger. 

■'^ . ■ ^ ■ • ■ - 

Conclusions , ^ 

It was conclilded that the derived effort index would not be adequate 
as an index of a single individual's effort, but could be quite useful in 
assessing differences in effor^^ between groups. A numb^t of specific 
practical applications were discussed. Calculation of derived effott 
scores was recoimaended for (1) the award of incentive^ to groups, (2) 
•the award of incentives to lower ability students (2/ comparing the 
motivational characteristics of different courses or blocks within 
courses, (4) feedback to students and instructors £^bout group effort, and 
(5) goa^setting. ^ 



4 



\ 



PREFACE 



This research was conducted under Project /Task/Wor)c Unit 11410103 
Psychological Factors in Instructional Systems Design. 

We would like to thank the members of the research staff who made 
miny valuable contributions to the project: James R. Terborg,^ Blair 
Cllark, Lee Stepina, Gail Schmaltz and Francine JLamattina. 

\. We would also like to thank the Air Force personnel at Qhanute 
Air^^orce Base for their extensive cooperation: Mr. R. Mitchell, 
Mt. H. MulL, Lt. D. Glllman, and, Mr. G. Scharf. 



I 



Table of Cohtents 

/ 

Intxoductioa. • 5 

y » 

Review of the Literature » 7 

Conclusions from the Literature Review .....11 

Methods and Procedures • • • 13 

Overview. ^ .-^ 13 

Selection of Task Material 13 

Ability Testing. % - - • .1^ 

Dependent Variables • 15 

The Work Simulation \ ^5 



Results. 



.18 



Overview ' • . - . . .18 

Comparison or AF and Simulatiqjfi Subjects 18 

Generation of Regression Equations • 19 

Generation of Derived Effort Scores • 23 

Evaluating the Hard Criterion of Effort 24 

Predictive Validity of the Derived Effort Measure 26 

Sunmiary of t;Ji^ Results * .....30 

Discussion and Conclusions • 31 

References . . . .' ^ 36 

Appendices *. • ♦ 39 

A. Advertisement used in recruiting subjects 39 

B. Pre-employment electricity test . ..41 

C. Sample of an appraisal..^ ^3 

D. Effort Questionnaire ^6 



4 

ERiC 8 



latroductlon 



The goal of training in the Air Force is to maximize each individual's 
contribution to the Air Force mission* To accomplish this goal. Air 
Training Comm^d (ATC) attempts to train each man to accomplish specified 
criterion objeiptives consonant with fiterld requirements. The Issue thus 
becomes^ one of optimizing performance in the training setting and 
ultimately in the fiel^. Performance can be thought » of as beitig composed 
of two major components, ability and motivation (Vroom, 1964). Clearly, 
other classes of variables influence performance, but most would agree 
that ability and motivation are extremeHy important components of perfor- 
mance (Campbell^ Dumnette, Lawler & Wplck, 1970)^ This argument implies 
that to maximize performance, one could maximize ability and maximize 
motivation. The ability component can be dealt with by giving remedial 
instruction to low ability students (e.g., remedial 'reading courses) and 
^y selecting instructional strategies to fit each individual 'a abilities 
and traits. This leaves the issue of motivation. One of the apprpaches 
to the motivation Issue has been to investigate the feasibility of using 
incentive motivation techniques in an ATC training envii/onment . (e.g., 
Pritchard, Von Bergen and DeLeo, 1974). / 

The classical approach to Incentive motivation has been to give 
valued rewards contj^n^ent on actual performance in some task. The 
important point is that rewards are given on the basis of performance . 
Thus, this classical approach would suggest t^^at incentives be offered 
to airmen on the basis of their scores on exams and/or their speed of 
finishing sections of the course. However, there is a problem with this 
approach. Theories of human task motivation which deal with incentive 
motivation (e.g., Vroom, 1964; Porter ^d Lawler, 1968; Lawler, 1971) 
talk about three* components that influence motivation: (1)^ Valence 
of rewards: the value the individual places on incentives, (2) 
Performance - reward instrumentality: the perceived degree of connection 
between performance and obtaining the r'ewards, and (3) EfjEort - 
performance expectancy: the perceived degree of relationship between « 
a person's level of effort and level of performance. ^ 

Only the first two of ^hese components are generally considered by 
a classical incentive motivation approach. That is, valued rewards are 
identified and made contingent on performance (e.g. , score on exams 
and speed of finishing.) This ignores the issue of effort - performance 
expectancy,. Specifically, high performance may be seen as highly 
valuable^ but if a person sees no relationship between his level of 
effort and attaining high performance, he will not be motivated to 
attain high performance,. For example, if a mem were offered $10,000 to 
pick up a 2000 poihfd blci|ck of concrete, he would not attempt it even 
though the valqe of performing tlie feat would be very lllgh- He would 
perceive that^no matter how much effort be expended he could hot pick 
up the weight. Taking a more realistic example, an airman in tech school 
might value a 3-day pass very highly but feels that being a top 
performer would be impossible for him* Thus, the paalB would not cfiotivate 
him. The problem is even greater when we consider that it ii^V^ho low 
ability student who sees little chance of being a top performer, yet it 
is precisely this student we wish to motivate. 

5 ' r 



9 



Thus, what is additionally necessary is to deal somehow with the 
relationship between effort and performance. This could be accomplished, 
^ 'at least in principle, in a rather simple way./ If one were^^^ maximize 
the contingencies between effort and rewards, students of ^ any ability 
level could belmotlvated. That is, if an incentive system were so 
structured aq to give valued rewards for high levels of effort , fall 
students should be motivated. 

The purpose of th^ research described here was to develop and > 
evaluate a technique for objectively assessing effort. If this could be 
doxxA in an economical and objective way, it would become quite easy to 
give incentivl^ on the basis of effort. • 

Aside from the general issue of giving rewards on the basis of 
effort, this research ties in directly with the development of an 
Advanced Instructional System (AIS) by the Air Force Human Resources 
Laboratory. The AIS is a computers-managed systems approach to training. 
One of the central requirements o£ the AIS is that training packages be 
individualized. This individualization covers not only the type of / 
training an individual receives but also attempts to maxitnize the actual 
motivation of each student. Developing a system which assesses student 
effort, and which could be used to award incentives i&f clearly consistent 
with the philosopl:;^ of individualizing the training package. 



ERIC 



10 



Review of the Literature * 
Def^nltlona and Measurements of Effort^ 



Techniques for measuring effort are as numerous as definitions of 
the term. For example, effort has been defined as a determinant of 
motivation (Atkinson, 1964) ; one component of motivation (Yacorzynskl, 
1942); the equivalent of . motivation (Farquhar, 1963); and a measure of 
motivation (Lawler and Porter,* 1967). Bach of these definitions .may 
Indeed be appropriate depending upon the cont:ext in which the term 
is used. Regardle&s of the context, we would argue that effort is 
closely associated with the construct of motivation. As such, effort 
becomes a critical construct in 'the study of motivated behavior. 

In this section, various techniques for measuring effort will be 
described and evaluated. vThe techniques to be discussed fall into 
four categories: physiological, rating, behavioral^ and computational. 

Physiological Techniques 

A very fundamental approach to the measurement of , effort Involves 
the use of physlbloglcal Indicators. From this perspective, 
motivation or effort, is viewed as a state of general arousal (Leukel, 
1968). Generally, the level or degree of effort is determined by the 
activity of the central or autonomic nervous systems. McClelland (1955) 
lists several indices which might reflect this activity level: 

1. Energy expenditure (basal metabolic rate) 

2. Autonomic actlvlLy (skl^ conductance) 

3. Thresholds (reaction time) 

4. Muscle activity (eye movements and action potentials) 

5. .Central' excltatjU)n level (determination of the frequency in 
cycles per ipfrond at which a flickering light fuses) 

Other researcheiC^ have proposed additional physiological indices. 
Circulatory changes Such as pulse rate (Bltterman, 1945) and pulse 
pressure (Lovekln, 1930) have been used to indicate effott expenditure. 
Conflicting reports suggest, however,- that there is not an established 
technique for estimating effort by these changes in the circulatory 
system. ^ , ' 

Measurements of muscular tension have also been used indices of 
effort (Davis, 1939; Ryori, 1947; Solomon, 1946). Further, Icott (1§60), 
proposed auditory flutter fusion (the frequency in cycles per second at 
which a fluttering sound fuses) as an index of effort. 

A major problem with physiological indices of effort seems to be 
that such measures are confounded by a variety of factors - time of day 
that the measure was tak^, amount of sleep that subject had, amount of 
food consumed, task difficulty, fatigue, etc. ^ 



This review is baded heavily on two other reviews. (Ramby, 1973 
and Mayo, 1974.) The maj.orlty of the two reviews were done under support 
of previous Air Force Human. Laboratory contracts; R.D. Pritchard, 
Principal Investigator. 

7 



11 



Beyond the substantial problem of confounding variables, there 
exists the difficult, obtrusive, and expensive nature of obtaining 
physiological measures. It would appear that frequent readings using 
rather obvious aijd expensive equipment are necessary to obtain good 
measurtes* Even when such indices are obtainable, further research Is 
needed to determine the accuracy, and applicability of such techniques 
in a field setting. ' * 



Rating. Techniques . v " 

Observations by trained experts such as clinicians or time-study men 
have been use<J to assess effort • Early Judgmental indices were used to 
rate a person's work pace as being faster, or slower than "normal" 
(Strauss and Sayles, 1960). For example, time studies (Barnes, 1940)* 
require the observer to time each separate operation of the. task and 
rate the worker on skill, .effort , consistency, and the conditions under 
which the study was conducted. These ratings are then numerically 
adjusted to correct for differences in obsexrvation time. .Pret.sgrave / 
(1945) developed a time-study technique wherein measures of ef foi:t were 
determined by variations in the speed *to complete a task. It would 
seem that such a method based solely on speed would also include the 
effects of skill, and therefore hot present a clear assessment of 
effort. By -classifying workers according to skill level, Ryan (1947) 
'refined the criteria for determining levels of effort' irom observers* 
Joltings. 

One difficulty with observer ratings of effort is that raters may be 
responding to performance rather than effort. Braunstein, Braunstein, 
and Blumfield (1965) assessed the relationship between an overall 
observed rating 9f effort and various measux.es of actual perfcJrmance. 
The authors found that effort ratings wei^e only related to three of 
the six performance measures. They conc^luded that, the raters may indeed 
have been responding to something other than level of performance. , 
To insulate effort ratings from the effects of knowledge of performance, 
some- researchers (Mitchell, 1966; Hackman & Porter, 1968; and Schnfcider 
& Olson, 1970) have devised instruments requiring the raters to focus 
'on effort rather than performance. Such attempts to separate 
performance from effort seem to be an improvement over previous scales. 

Since Judgments of effort certainly have a subjective component, 
self-ratings have also been used to measure effort. Obviously, Judges 
or raters are not required, nor is any mechanical device needed to make 
this type of effort determination. Typically, self-ratings simply 
require an individual to rate himself on some type of scale according 
to the amount of effort put into a specific task or job. 

Thomdike (1913) asked subjects to ratq their effort on various 
parts of a learning task. Furst (1966) developed an effort scale and a 
measure of motiv&tiyn. Subject^ rated themselves on a five-point 
scale for nine effort statements. Furst found that this effort rating 
Instrument correlated higher with a measure of motivation than did an 
achievement ^measure. Employee attitude scales pertaining to diffeVent 
aspects of the Job have also beeft .used to measure effort (Lcwler arid 
Porter, 1967). 



12 



-—^Because of their subjective nature, both observer and ^self ratings 
of effort may be contaminated by the' same factgrs that bias all rating - 
scales: for exa mpfie ^ i^esponse set, halo effect, and leniency effect. ^ 
This problem. In addition to questionable validity and reliability, 
suggests that a more objective technique for measuring effort would 
be cleslrable, . ^ 

Behavioral Techniques ^ • ^ . • . 

ReseajTchers in learning ^ theory have used certain behaviors of 
human arid infra-human organisms .as Indices of effort. One approach 
establishes a motivational 3jt:ate by exposing the organism to a set of 
antecedent conditions, and tjien observing t^he change in the organism* s 
behavior. , This beha\dLor change is' considered a measure of motivation. 
For example, in the ease of hunger, motivation would be operationalized 
by subjecting laboratory animals to various degrees bf f\3od deprivation.' 
^The* rate of response behavior (bar pressing) would be the measure of 
motivation or effort. Obviously, such a rate of response measure is of 
little value when dealing wil&h the complex nature of human effort. ^ •< 

Another approach, more useful with hiimans, is simply to measure i 
behaviors that appear to correlate with effort. Davis (1939) n)und \ 
cert^ii movements of the right arm to be associated wijph the effort | 
experience in the solution of arithmetic problems. Luchin3 and Luch^ns 
(1954) empJ.oyed a mirror-tracing task and observed that fidgeting and 
sw^atin^ aclc^ompanied high levels of mental effort. Yacorzynski (1942), 
fpund some evidence to indicate that time taken to complete a given {task 
was related ^ to effort. Such behavdLoral indices kve subject to a 
variety of problems. Measured like^€rm movements, fidgeting, and- 
Weating are potentially' confounded by any individual or situational 
variable that ailso might cause such behaviors. Task completion time 
as an effort measure does not account for ability differences. Thus 
it seems that behavioral measures are subject to several specific 
problems, as well as the general problems of rater bl^s and * 
urireliability. Such problems significantly reduce the desirability of 
using behavioral indices of effort. | 

Computational Techniques 

Another '^general approach to the measurement of effort is based 
simply on computationally deriving an effort score« Educators 
interested in obtaining such effort scores typically u^se the ratio or 
difference between achieveme^it and intelllgnece scores. The simplest 
of these indices of effort is the Accomplishment Quotient (Pintner, 
1920.) The AQ is the ratio of an individual's actual rate jaf 
. educational progress (Educational Quotient) to the potential rate 
of progress (Intelligence Quotient). According to HaY^n (1931) , this ^ 
measure of effort evaluates the accomplishment of an IxKiividual in -terms 
of his oWn ability. Deviations ^bove and below 1.00 indicate the degree 
of effort expenditure and ability utilization in various performance ' 
tasks* • 



The £f f Iclency -^tlo ||£R) has also been^ vused as an Index of effort 
\Ford, 1931), StatiWc^ily it is : 

» S/Av(IQAOO) X lofo, . 

^ere/.S isc the individual's score on ^n achievement test ^pi^Av is the 
Average of an experimental group on the same achievement test. A 
resulting score of 100 indicates that the individual is exerting 
normal or average effort. Scores above 100 reflect above average 
effort, whereas scores below indicate below average effort. 

The, Effort Quotient (FQ) also used a ratio techniques to measure- 
effort (Tsao, 1943). , ' 

FQ - (E/predicted E) X lOO, K 

where E is an individual's educational score, and predicted E is the 
predicted value of the educational score based on an intelligence 
measure. More specifically, ^ the regression equation used to predict an 
individual^ a educational scor6 is: 

pre<i- E = - b^^M^ + h^^I = b^^I + 1% 

where M. and M are the mean values of intellignece and^ achievement, 
respectively; and I is the known individual intelligence score. The 
final component b^^ is determined by: 

where r^^ is the correlation between the • intelligence and achi^evement 
scores; and and *are the standard deviations of the intelligence 
and achievement scores, respectively. If the- resulting value for the 
FQ is 100 (actual educational score equals predicted score), then 
according to Tsao, the individual is exerting normal effort. Values . 
■ of FQ above or below 100 reflect higher or lower amounts of effort, 
respectively. 

A common difference technique used to^ assess effort is the Effort 
Score (McCali, 1930). 

. F » T - T. + 50, 
e l 

where T^ and T are T scores on an achievement and intelligenct test 
, respect! vely (a T score is 1/10 of the standard deviation of the 
experimental group on the particular measure). An individual whose 
T^ score is equivalent: to his T^ score is said to be ei^^erting normal 
effort. Scores above and below 50 reflect various degrees of effort. 

In a review of the previously mentioned techniques for measuring 
effort, Tsao pointed out that the AQ score,- the F &core, and the ER 
score fail' to consider the correlation between intelligence and 
•education. He concluded, therefore, that each of these techniques gives 
a biased estimate of effoi^t. 

^Similar techniques Wayj^ been used in other areas of Education. One 
other common approach to determining over- and underachievement, is the 
derivation of a discrepancy score. This particular tecl^nique is 
computed simply by subtrao^ng the aptitude score for an individual - 
from his achievement score. The resulting residual is a measure 



10 



of over or und^r-aahievement . That is, a negative descrepancy score 
repi^sents under-achievement , Wd a positive one over-achievement. 
Because this crude difference score leads to a systematic negative bias 
for those individuals high in aptitude, and whereas a positive bias v 
^will result for those low in aptitude , . ,Thomdike (1963), suggests that 
achievement be predicted f ^om aptitude on the basis, of the known 
correlation between the aptitude measure and' the achievement measure 
^ (this is similar to the FQ technique). This prediction or regression 
equation will give the average achiWement score for individuals at any 
given aptittjde level. The predicted value will then be an ilabiased 
estimate of achievement. 

Smmnary of Effort Measures \ - ■ ' 

The techniques discussed,- particularly rating scales, tend to be too 
subjective to adequately assess effort. Of concern-are the effects of • 
rating biases which may serve to contaminate the measure.* 'That is, 
raters may be responding to a dimension other than effort. Further, 
the question remains relatively unanswered as to whether physiological, 
measures or rate of respon'ding measures do ^indeed reflect effort. These 
techniques focus only on output, and therefore may riot be djmpletely - 
representative of an individual's effort. / 

Given that a motivated individual is one who exerts both physical and 
mental effort, effort- appears to be defined aiong ^re than a single 
dimension. What is needed then, is an approach whiaji is •reflective of 
^ both the physical and mental aspec^ts of effort,, and at the same time is 
an "objective and valid representation o£ the construct. ITie work of 
Tsao with the Effort Quotient, and the work of Thomdike with over- and 
under-achievement appear to come closest to satisfying these criteria. 
Although Thomdtke's measure is not conceptualized in, terms of effort, 
it does afford an indirect and less subjective approach to the 
measurement of over- and under-achievement. Further, this measure has 
been connected to motivation (effort) by a few authors (Appelzweig, 
Moeller and Burdick, 1956; Mayo and Manning, 1961; and Farquhar, 1963). 
In each of these studies, the terms over- and under^-achievement have 
been used as the operational definitions of high and low , motivation. 
That is, this discrepancy or residual' score was considered a measure of 
.motivation. 

■* The Effort Quotient (FQ), while similar Thomdike 's technique, uses 
a ratio approach for assessing effort, rather than a difference score. 
. Further, since. FQ score wa^ developed in terms of effort it may also 
be valuable in the construction of an effort measure. 

I 

Conclusions from the Literature Review ^ 

Clearly there have been many approaches to measuring efforf. Moat 
of them, however, have 'serious limitations for use in an Air Fo^rce . 
training context. Ratings and physiological measures would not only be 
difficult to collect, but their validity is questionable. 



11 P 



The approach taken by the present research is to de'al with the derived 
effort measure originally developed by Pritchard, Von Bergen, ^d Dfeleo, 
1974. The approach assumes that performance is largely a function of 
ability and motivation- (effort). Given this 'assumption, which c^be / 
empirically tested, one can get a measure of effort by partialling out 
the influence of ability on performance. The residual^ could then be " 
considered a measure of effort. Specifically, one^'eould generate a 
prediction Equation from ability test data which taps relevant ability 
.domains, of a given training course. Once this equation is generated. 
It could b^ used to calculate a predicted score (a level of performance 
and/or speed of completion) of a given individual on a given segment of 
a course. Conceptually, this predicted score would be the mean actual ' 
performance of a group of students wit}i similar sets of ability scores. 
Thus, since ability is constant, variations in performance should be due 
to variations in effort. ^ • ' • * ^ V 

This predicted score for each individual^ can, then" serve as the^ 
mechanism for determining effort. This would be accopglished by ftul?- 
tracting the predicted score from the student ^s actualf^core. This ' 
-residual could then be considered as a measure of effott,?and the higher 
the score, the more effort the stud^i^t exerted. 

It is obvious that the entire system pests on the assumption that 
technical school perfopance is largely a function of ability and 
effort, and that when %blMLty is partialled ,put the remaining variance 
in performance is highly satutated with vairiance in effort. It is < 
pi:;ecisely this assiuaption that the present' research was designed to test. 
Before discussing the research plan, howeveY, let us consider some of 
the advantages of an incentive system which gives, rewards '"ba^ed on 
effort. ^- . 

First, such a system is * individualized. Kach persqn ^ets a predicted 
score based on his own pattern of abilities. Secqnd, thfijystem would 
reward effort rather than performance. The adv^tages o| this approach ^ 
^ere discussed earlier. Third, the system equalizes incentive 
system in that all students have an. equal chance to 'elW Incentives . 
Related to this is the advantage that all students should* be- motivated 
to high effort, not just the high ability students. Fourth, the measure 
of effort could serve as a very useful overall measure of the effective- 
ness of changes in t?he course. .Specifically, innovations and 'constant 
changes will undoubtedly be made in a course. The problem is how to 
as-sess the impact of these changes. Our owft experience has shown that 
the ability levels of students entering any given technical school 
course can vary greatly over a short period of time. This makes' 
assessing the ^effects of any changes very difficult without partialling 
out abiltiy somehow. The system proposed Here 'does this automatically 
and thus is directly interpre table. Other less ^obvious advantages would 
include the identification of problem students, and use for coqnseiing 
purposes. For example, the system would easily, identify a student of. 
high ability who was barely passing the course (low effort). Finally, 
the system would be useful for other motivational applications. For 
example, the target score could be used as the basis for various types 
of goal setting procedures. 

12 

. 16' • 



Method and 'Procedures ' 

Overvlev^ s * i 

Initially, it was necessary to select an ongoing AF tei^hnical. training 
coursfe to use as the target course and examine the course materials in 
order to assess the abilities required by that material. Based on this 
analysis, a battery of ability tests was formed to tap relevant 
abilities.- These testB were then give^to a sample of AF trainees in 
the selected course. Regression equati%s were then developed to predict 
performance in the target course. The next step was to deyelop a 
simulation of the selected course using civilian subjects. These 
simulation subjects took the same battery of ability tests , -and worked 
on the same trai^xing materials as did the actual AP subjects. A ^econd. 
set of regression equations was developed for the simulation sample. 

Derived effort scores were. then calculated ' using bodti the Air Force 
generated weights artd thie^self generated weights by subtracting from 
actual performance the performance level predicted on" the basis of- , ^ 
ability The final step consisted of comparing the derived effort scores, 
of the simulation subjects to. a hard criterion of effort wMch was 
based on a photographic record. The greater the '^^i^tlonship between^ 
the derived effort measure and the hard cirterion of effort, the better 
the derived effort could b6 said to be measuring effort. 

Selection of Task Material . * 

Four criteria were used in the selection of task jnaterial. The first 
was that it be part of an ongoing Air Force technical course. This was 
necessary to be "able to maximally generalize the Air Force context. 
Second, it had to be self-paced. Since the AIS, for example is 
self-paced system, the finding^ would be more usable if a self-paced 
course was utilized in the simulation. Third, it had to ^^^^/^.^^^^^^^ 
programmed texts. 'It was not feasible to t^ave a fully trained Instructor 
in the simulation, and' thus the use of progranmed texts was felt 
necessary. In addition, the AIS will use progranmed materials- 
Ffnally, for logistical reasons, it had to be a section of materials 
which did not require extends iVe training equipment. ^ \. -r^ 

A number of possibilities were examined, arid ultlxaately .the Aircraft 
Electrical Repairman (AER) course (3ABR42330) was selected. This is a 
self-paced course utilizing programmed texts which covers the J f| 

fundlentals of electricity and the maintenance and repairs jl^f"^ '^^^ 
systems in aircraft. Although the" complete course requires extensive 
training equipment and skilled instri^ctors, the first section of the 
"urse Lvers more basic material and does not require equipment. 

The section of the' AER course used in this study consisted of the 
, following programmed texts: 

Aircraft Familiarization 

Elements of Physics and Mechancis ^ 
Electron Theory » 



13 



17 



Magnetism 

DC Generation and Basic Circuit Symbols and Terms 

Series Circuits Wiring Diagrams \ 
.Because this &^ hour section occurred early in the sequence of course 
work, students were not required to u^ any training equipment. As a 
whole, the prograjnned texts were introductory, in nature and were 
designed to provide basic knowledge andfbackground necessary for the 
completion of subsequent course work. . 

The specific learning objectives provide an, excellent description of 
the content area of each programmed text. I n Aircraft Familiarization , 
students were required to identify aircraft components, aircraft 
movements, and the direction of aerodynamic forces from diagrams. Al^o 
required was a working knowledge of the alphanumeric aircraft designati 
^system. ( ^ ^ ♦ ^ 

Elements of Physics and Mechanics ^ was used to train students In the 
principles and methods of using simple machines.' Also required was a 
knowledge of the causes and 'control's for various types of corrosion. 
Students had to identify the effects of pressure and temperature 
changes oij solids, liquids, and gases. 

•Successful completion of Electron Theory required an, understanding , 
of subatomic particles as well as the principles, symbols, and 
measures associated with voltage, resistence, current, ^d conductance^. 

The programmed text. Magnetism, .de^t with characteristics of 
permanent magnets and electromagnets. Students were 'required to griksp . 
basic concepts of magnetism and had to identify electromagnetic effects, 
■from diaigrams. . ^ 

Generation and Basic Circuit Symbold and TeBns was used to t^iach 
students the basic operation of- electric coiaiponentfl associated with o 
direct current generation. Additionally, th^ text was used' to train 
students to recognize certain electrical symbols and terms. 

The final text , Series Circuits Wiring Diagrams . was used to train • 
studeqts in the use of electrical stat:j,on numbering systems. Terms and 
procedures associated witb this system were stressed. ^ 

Ability Testing 

Once the materials were selected, they were carefully examined to 
identify the ability dimensions required by the materials. This was 
done on an intuitive basis. Ultimately, a battery of five tests were 
selected. 

These tests were selected because they were: 1) designed for 
standarized group administration, 2) highly reliable and valid ^ and 3) 
relevant to the ability requirements of the task material. The Otis- 
^^o^ Mental Abifity Test was. used to measure gerieral Intelligence. 
General abilities related to logic, mathematics, and vocabulary are . 
measured via this instruments Such abilities are considered relevant to 
nearly any learning task* 

The Paragraph Meaning Test was selected from th^ most cfurrent version 
of the Stanford Achievement Test Battery . This teat was selected to tap 



14 
18 



ERIC 



reading con^Jrfehension skills.'^ The- format of this test closely parallels 
the format of* a programnfed -text - a paragraph Is presented and*related 
questions and completion items follow immediately • It was reasoned that 
the ability to ,cpii5)rehend and respond to a small segment of written 
material would be very relevant to a programmed learning task. 

While progressing through the task material, the subjects were also 
required to interpret figures and diagrams. The^Stj^dj, SkiU^ of the 
1952 version 6f the Stanford Achievement Test Battery wai? selected since 
it dealt primarily with tl\p extraction, synthesis, and interpretation 
of informatipa presented graphically an4 diagrcanatically. 

Also impottant to completing the progranmed texts was very ^basic 
mechanics and an uncler8t;anding of topdcs su^ch as simple machines, 
magnetirc lines of force, and electron flow. To* measure related 
abilities, the Mechanical Reasoning and Space Relations sections of the 
Differential Aptitude Tests were administered: 

Once selected, the battery wa§ administered to a sample of AF trainees 
in the AER Qourse. This was acconpliahef by giving* the three-hour 
test battery to students as they entered the course. ^ A member of the 
research staff administered the battery to stTiden^s on tlbeir first day 
of class'^ Trainees wete tol'd that the tests' were in^prtaiit in that they 
tapped abilities relevant to doing well . in thi^ qburse , and that they 
should do their best on the tests. . \ 

' *? • , * - • 

Dependent Vari*ables • 

Final data .collection in the Air Force context consisted of 
performance data over the first six ?t!b for -those trainees who had takQp 
the ability test battery. Vata were collected on three variables. The^ 
major dependent variable was tlme-to-complete the first ^Ix PTs. Also, 
data' on scores (percent correct) £or each PT were collected. Finally, 
when the trainee had flnlsheji the six PTs he was given, a specially 
designed comprehensive test over all the material covered In the six PTs. 

Da€a on the first two variables were collected by having the 
instructor recotd, on a specially designed form, the time a 'student 
started a given PT, when he took the appraisal test for that PT, his 
score on the appraisal, and the time he started the next PT. If the 
student failed' to pass the PT, this was also recorded. Thus, it was 
po.sslble to calculate total time spent on the PTs, as well as mean 

appraisal score, „ ' j 

The comprehensive test was djsve loped especially for this project to 
cover all the material in the six t^Ts. Sixty ltefl»s were Initally 
fcomplled. Many items W6rl newly developed while others paralled Items 
found in course appraisals and criterion testd. These items wer^e 
ei(raluated for accuracy of content by AER course instifuctors and then 
administered to 32 students in the program. Based on an item analysis 
of 'their respo^nses , a< final 55-ltem version of the comprehensive test 
was. developed. 

•Hie Work Simulation 

The second phase of the research" was to design a simulation of the AER 

15 , 



course, collect the same ability and performance dati, and obtain a hard 
criterion of effort. 

Task Material 

the first six^FTs of task material used in the stimulations were 
identical to those used in the actual AER course. Copies of the actual. 
PTb were made, and the same exact tests were usedy Since the major 
dependent variable was the time it wQuld take to [finish the six PTs it 
^wa3 Important that every subject actually finish the iftaterial during 
the simulation. Thus, it was arranged so that, based on available Air 
Force experiences, everyone dhoiild be able to^ finish in the 20-hour 
working time scheduled for th4 simulation. However, xt was anticipated 
that most subjects would finish in less than 20 hours, and the better 
students wpre expected to finish much more quicklj^. Consequently, 
addl^t^ional task material was generated for use wh^n the subject had 
finished the flrdt six PTs. This material was based on published 
programmed texts in electricty and electronics (N^w York Institute of • 
Technology 19^3^ 1964) and had been used successfully in a similar 
setting by Pjritchard, Leonard, Von Bergen, and Kirk (1974). However, 
for the purposes of this report, only the data from the first six 
PTs are relevant. * - ' , , 

Subjects ' * ^ 

It was felt important that the subjects selected for the simulation 
be as similar as possible to the trainees in the AER course. Thus, .an • 
attempt was made, to recruit subjects of the same age range and general 
ability level as those in the AER course. « 

The simulation Was actually composed of three separate data • 
Qollections, in three «eparate cities in Indiana. Approximately two 
weejcd before the simulations was due to start, advertisements were 
placed in local newspapera and flyers were distributed in the area 
describing the job and telling subjects where to ixeport. (See Appendix A 
for a copy of the advertisement.) It was planned to have 20 subjects 
in each of the three data collections, and the adv^ertisements were 
quite successful in that each condition, more th^ twenty people showed 
ujp for the job. i 

Procedure u 

Once the subjects arrived, they were given an application bl^nk and 
the job was described to them. They were told that we were interested 
in a new method of technical training which involved programmed texts,, 
and that they woi^ld be working with these programmed text^ in 
electricity and electronics. It was pointed out that no special skills 
Vere required j and that it was not necessary that they have any previous 
experiences or knbwledge of electricity or electranics. At that point, 
^ anyone not interested in pursuing the job was tol4 that they could leave. 
No one, in fact, actually left. 

. They were tHen told that, as the ad stated, we only needed 20 
people. Since more than 20 Wjpre present, some, of them coulH not be 

16 

' 20 • 



hired. They were then given a sliort test of knowledge about electricity 
ahd eleetronlcs. This te^t dealt with questions which would not generally 
be familiar to someone without some background In this area. Examples 
were "Define static discharges, magnetic permeability etc. (a copy of tjie 

test appears in Appendix B.) ^ ^ , 

Although subjects were advised to do as well as possible on the test, 
its actual purpose was to eliminate those people who* had some knowledge 
of electricity and electronics • a 

^fAnyone who got more than one answer I correct on thid test was eliminated. 
Of those applicants remaining, 10 males and 10 females were randomly 
selected. «The rest of (^e applicants were thanked and dismissed. 

The remainder of the first day consisted of giving the subjects' 
the battery of ability tests used in the Air Force sample^ explaining the 
task in detail^ and giving subjects some practice working on a sample FT. 

^ In order to assess the utiil^ity of the derived effort measure it was 
necessary to assure that some v^tiability in effort was, in fact, present.' 
To this end, three Conditions were utilized, varying in the type of p^y . 
systemsemployed. The first was an hourly system whereby subjects received 
$2.00^ per, hour. The second was essentially a piece*" rate. Each FT was 
given a aollar value, and when the«> subject parsed the appraisal for the 
PT, he reOeived that amount of money. The more FTs finished, the more 
money he would earn (All money earned was paid at the end of the week) . 
This 'Value'* of each FT was based pn the data from the hourly condition. 
If, on the average, hourly subjects took e.g. 2 hours to finish a given 
FT, this value was muitiplied by %he $2«00/hour rate to get the "value" 
of that FT. In this example, it would be $4.00, Thus, if subjects in 
the piece rate condition worked ^t an average pace, they would make 
$2.00 per hour. If they worked faster, they would earn more. 

The third condition was similar to the piece, rate' or fixed ratio (FR) 
condition in that pay was contingent on performance, but the actual pay 
schedule was different. The third condition utilized a variable ratio- 
variable amount (VRVA) schedule. In this condition, subjects did not 
knaw how much^ a given PT was^^wchrth since its "payoff" varied from $0 to 
$6.6o times the number of hours taken to complete it for the hourly 
condition. Thus, for a PT which took 2 hours to complete in the hourly 
sample, subjects in the simulation sample could earn $0 through $12.00. 
The determination of which level of pay they actually received was random, 
but set so as to average $2.00 per hour if performance equalled that in 
the hourly sample. * ^ 

The three conditions were run independently in three different cities 
and subjects in one condition were exposed only to that condition* The 
system was explained the first day, and subjects worked the following , 
four days under the appropriate syst^. Including the first day used for 
testing and orientation, subjects were in the simulation for five days, 
five hours per day. 

At the start of the second day, all subjects were given the first PT 
and told to sfart. ^ When they felt they knew the material, they approached 
the instructor and were given the appraisal for the first PT. (A sample 
appraisai appears in Appendix C.) When completedi it was scoured by the 
instructor. If the subject passed the appraisal, he was given the second 
FT. If he failed the test (75% correct was the criterion for passing) 
he was told to re-study the FT. When he felt he was ready he re-took the 



-# 

first appraisil,/ Actually, this was another form of the appraisal. He 
continued thia procedure until the appraisal was passed. When a subject 
finished the set of six PTs, he was given the comprehensive test covering 
the whole set of PTs, the same test given to the Air Force trainees. The 
subject was not required to reach dny set criterion score on this test. 
Subjects were than given a brief interview and completed a questionnaire 
(See Appendix IJ.) Once they had completed this they star£ed on the "new" 
' ^ material and worked on it for 4:he rest o€ the week. Throughout the entire 
work week subjects could thke breaks whenever they wished, for as long as 
they wished. A separate break, area was provided, and coffee, soft drinks, 
and doughnuts were available. " 

The hard criterion of effort was based on a photographic record. 
Each working day pao 8 mift Tiiovie cameTas took singjie frame pictures of the 
working area^. Each subject's work place was clearly vis^fsle. The cameras 
took one frame every sist^ seconds for the entire day. The cameras were 
cjjsarly visble to the subjects, and actually made an audible click when 
each frame was taken. However, subjects quickly adapted to the cameras 
and by the second day, when the effort data were actually collected, th^re 
was absolutely 4io evidence that the subjects were paying any attention to 
the camera^s. As will be discussed below, the measure of effort consisted 
of the percent of time the subjects actually spent working on the task 
material. 

* • Results 

Overview ' 

Analysis of the data consisted of four basic ateps: 1) generating 
the regression equations, 2) computing the derived effort scores, S) 
evaluating the hard criterion of effort, and A) ex^ning the predictive 
validity of the derived. effort scores. Before turning to these topics, 
, we shall first address the issue of the comparability of the Air Force, 
and simulation subjects. 

Comparison of Air Force -Simulation Subject's 

Table 1 presents comparisons of the two subject g^ups in terms of 
age, IQ and the five ability tests. These data indicate that the Air 
Force trainees tended to be slightly older than the simulation subjects. 
This is du^ primarily to the fact that the Air Force group included some 
trainees who had been i|f Jthe service for some time, but had returned for ^ 
retraining in this career field. In fact, 79% of the Air Force trainees 
were between 17 and 20 years old. Thus except for the retralnees, the 
groups were comparable in age. Although no actual education data ate 
available our experiences with this course (Prit chard, DeLeo and Von 
Bergen, 1974) suggests that almosfe all jof the Air Force trainees had 
some high school and about 70% had completed high school. This 
corresponds closely to the amount^of education of the simulation subjects. 
The Tible-also indicates that in terms of IQ and Paragraph Meaning, 
' the ability level of the simulation subjects was higher. However, the 

two groups were about equal in terms of Mechanical, Spatial and Study Skills 
ability. For both groups, substantial variability existed in ability. 

\ , 

... 18 



ERiC ^ ^ ^ 



% , Table 1 , 

Comparison' of Air Force and Simulation Subjects 

AF SubJeuKts Simulation Subjects 



(N»187) 



(N«60) 



Variable 



S.Q. Range 



S.iy. Range 



Age 19.3 

IQ 92.6 

Paragraph Meaning 80.6 

Mechanical 24.6 

Spatial 15.4 

Study Skills 23.2 



2.4 
9.2 
21.9 
4.6 
5.6 



17-32 
70-127 



0-124 
9-35 
0-28 
9-34 



17.4 
107.9 
101.5 
24.8 
16.5 
26.1 



,6 
15.9 
21.2 
4.9 
6.5 
4.9 



17-19 
,63-137 
48-127 
14-36, 
5-28 
11-33 



Generation of the Regression Equations . \ 

^The basic strategy here was to attempt to predict performance on the 
training. materials from the ability data* This predicted performance 
score would then be compared to the actual performance score to obtain the 
derived effort measure. The optimal procedure would be to generate a 
regression equation on the Air Force trainees and apply this equatioa to 
the simulation subjects. This would eliminate the possibility of capi- ' 
talizin^ on chance that would exist if the equation was based on data 
fr^morfly the simulation sample. Thus, the prima^ regression analyses 
were done^with the Air. Force data. 

However, it was possible thatf the Air Force equations simply 
would not fit the simulation data. The subjects in the simulation were . 
of higher ability, and although the learning situation in the simulation 
was aimilar to the Airo Force set ting j|^they were, of course, not exactly 
the same. Thus, equati^n^'^were also constructed for the subjects in 
the simulation. 

Four performance crl^Kia were used as dependent variables in the 
regression equations. ThlMlrst was total PT time. Thi^ is the total 
amount of time a subject took to complepe the six target PTs, less the 
time taken to complete the appraisals. Since the time to take the 
appraisal is really testing time, it is not, strictly speaking, time on 
the PTs themselves. The second performance measure was th€^ average of 
the scores on the six apftraisals. IC was^^based only on the score of the 
appraisals that were passed. That is, if a subject took the test and 
failed on his first attempt, but passed on his second, ,pnly his second 
score would enter the calculation of his mean appraisal score. The 
third measure was his score on the comprehensive test taken after the 
last PT was completed, and which covered the material on all six PTs. 
The final score was an overall performance measure composed of the three 
previous variables. This score was calculated for each subject by 
weighting the time to complete score twice as heavily as the sum of aver^ 
age appraisal score and the comprehensive test score. For this cal- 
culation, the following equation was used: ^ 

Composite Score * , • 

2000 Minutes to Complete PTs 
mltetit^fl 

19 

23 



Mean Appraisal Score 
a mean appraisal 



score 



Coidprehenslve Score 

a comprehensive 
score 



Time to complete /reverse scored) was weighted more heavily since 
Air Force personnel Jc6lt speed was the performance variable of prime 
concern « ^ 

In order to deal with possible shrinkage, regression equation^ were 
developed using a double cross^alldatlon procedure. The sample was 
randomly split into two equal groups, and build-up stepwise regressions 
were calculated for each sample, A-and B. The weights developed in 
sample A were then applied to sample B, and the weights developed in B 
applied to A. 

Results of regression analyses on the Air Force data are presented 
in Table 2. The development and cross-validation analyses are presented 
to ed.ch of the four pjarformance variables. For each analysis, the 
specific ability tests entering the equation are presented in order of 
their entry into the equation. The resulting multiple R is also 
presented in order of their entry. 

Inspectio^n of the table ludlcatr^d that prediction was generally ^ 
quite good. The best overall index of predictability is the composite 
score, and for thisb measure , the multiple r for the total sample was .59. 
This compares very^%avorably with typical selection studies where 
predictive validities generally range in the .AOs and .50s. The table 
also indicates that the equation are quite stable. Cross-validated 
Rs are quite close to the magnitude of the R^'s based on the development 



Table 2 

Ability Based Regression Equations 
Using AF Sample 



Predicting Criteria 



Perfortnance 
Measure 



Total 
Sample 



Sample A 



Sample B 



d to A A to B 



Total PT 
Time 



(N=187) 

IQ .AO** 
Study .A3** 
Para .AA** 
Hech .AA** 
Spatial .AA** 



(N»90) 
IQ 

Study 
Para 
Spatial 
Mech 



.A5** 
.48** 
.A9** 
.A9** 
.A9** 



(N-97) 

Para .33* 

Mech .38* » 

Study .AO* I 

IQ .AO* 1 

Spatial ,A0* .A6** 



Average 

Appraisal 

Scare 



(N-191) 

IQ '^.AB** 
Spatial .5A** 
Mech .5A** 
Study .5A** 
Para .5A** 



(N«90) 

IQ ° .50** 

Spatial .59** 

Mech- .59** 

Study -p5-9** 

Para .59** 



.38** 



(N-101) 



IQ .47** I 

Spatial .55** I 

Para .53** ■ 

Study .53** 

Mech .53** .'^7** 

-. 1, — 



^5_1** 



20 

24 





(N«191) 




CN"SfO) 




(N-101) 








Compre- 


-IQ 


.51** 


IQ 


.54** 


IQ 


.48** 






hensive 


Study • 


,52** 


Spatial 


.;)5** 


Study 


.50** 




c 


Test 


Mech 




Para 


. ou 




51:** 








Para 




necr* 




O^dU JLcU. 


51** 








bpauiaJ. 




■ 




Para 


SI A* 


55** 

V 


.44** 












(N=«97> 










IQ 


.52** 


IQ 


.57* 


IQ ^ 


.51** 








study 


.57** 


study 


.6p** 


Mech 


.54** 




> 


Conxposite 


Mech 


.58** 


Para 


.61** 


Para 


.56** 






Para 


.59** 


SpatiaLL . 


.61** 


Study 


.57** 








Spatial 


.59;VVc 


Mech 


.61** 


Spatial. 


.57** 


.60** 


.56** 



*p < .01 
**p < .001 



sample. The average shrinkage -was only .03 correlation units. 

Analogous data for the simulation sample are presented in Table 3. 
These analyses utilize the ability data from the simulation sample to 
predict performance in the simulation. 



Table 3 

Ability Based Regression Equation Predicting Criteria Using 

Sxmulation Sample 



Perfo^rmance 
Measure 


Total 




1 Development I 


Cross 


Sample 




' Sample 


1 


Validated R's 




(N=57) 






1 


(N=18) 


f 

Total PT 


IQ 


.40** 


Mech 


.46** 




Time 


Mech 


.44** 


IQ 


:.5i** ■ 






Para 


.45** 


Study 


.51** , 






Study 


.45** 


Para ^ 


.51** ' 


.36 . 




Spatial 


.45** 








(N°57) 




(N-39) 




(N-18) 




Para 


.^2*** 


Para 


.44** 






Mech 


.54*** 


Study 


.45** 




Average 


•IQ . 


.54*** 


Mech 


.46** 






Study 


.54*** 


IQ 


.46** 






Spatial 


.54*** 






.68*** • 




(N-57). 


1 


(N-393 




(N-21) 


Comprehensive 


Para 


.81*** 


Para 


.72*** 




Test 


IQ ' 


.85*** 

1 


Spatial 


.76*** 





21 






AAA 

Study .86 
Spatial .87^^^ 
Mech .87 


Study .79 *** * 
IQ • .79 *** ' 
Mech .79***7. 

7 ^ 


^.90 *** 




(N-57) . 

TO 60 idcic 

Mecft .62 *** 
Study .63 *** 
Spatial .63 *** 
Para .63 *** 


(N-39) . • // 
TO SQ iciiit 

tiJtch • .64/^/* ■ 
Study .64'*** 
Para .65'*** 
Spatial .6^*** 


% « 

lb 

.63** 



'1 



* p < .05 

** p < 01 . 
*** p < .001 

The cross validation procedure was epmewhiat different for these data. 
Due to the small sample size (N=57)^ a i^^buble oross validation procedure 
was felt inappropriate. Thus,* a traditiifinal cross validation design 
was employed where two-thirds of the sa^ijple was randomly selected to 
constitute the development group, and t^e remaining third the hold^-out 
group . // 

The data in this table Indicate tn^at the level of predictions in . 
the simulation was almost identical tcj^ that in the AF sample. For 
example, the Air Force equations predicted total PT tim^' .44 while- - 
the simulation data provided an R of' .45. Analogous multiple 
correlations for the composite were.v;63 and .59. The simulatioh 
equations were also fali-ly stable, The only equation showing any real 
shrinkage was for total PT time. 

However, even though the magii^tude of the R's in the simulation 
data is comparable to those in thfe Air Force data, it should be noted 
that the order of the predictors^ entering the equations and the /Change 
in R at each step variea from t^ Air Force to the simu!),ation data. . 
For example, IQ was the best pr^^ictor ±n every case wheh the total 
Air Force sample was used, but only in two of the four cases for the 
^simulation sample. (J 

This indicates that the i^tructure of the Air Force regression 
equation was different from tH^t of -the simulation equations. For this 
reason, it was felt neeessarV^/ to more directly compare the two sets of 
equations.^ Table 4 presents^^ata pertinant to one aspect of^ this 
comparison. The regression liquations developed in the Air Force sample 
were apfilied to the ability (data in the simulation sample and this 
predicted performance was c(^rrelated with actual performance. As the 
table indicates y use of the|Air Force weights with the simulation data, 
results in levelrf of predi<^tlbility very close to those obtained whotl 



^Since three of the 60^ subjects did rtot complete PT6, their 
data could not be used. Thus, N"57. 



ERIC 



22 

20 



the wei|;hts were applied to the samplec f rom vhich they were generated. 
ThuS j aithou^^ the actual equations for the Afcr Force and simulation 
.samples differed, the two sets of equations predict equally well, in the 
simulation data. . * 

^ A second method of comparison involves con^aring the scored 
predi^oted by the Air Force equations with those predicted bjr the 
simulation equations.. To do this, stimulation subjects received 
predicted SCO rejs for a given performance measure f (1) based on the 
Air Force equation, and C2) based on the simulation equation. These 

, two predicted scores were theji correlated across^the simulation 
su]?j|ects. The resulting correlations were^ * 



Total PT time 
Mean Appraisal Score 
Comprehensive Score 
Composite 



.87 
,89 
.96 
,98 



Thus, the tWQ' equations ^produced almost the identical^ rank ^rdering 
of the simulatf^a subjects. ' 

. f Table ^ i ' 

Comparison of Ptedictlbility of Criteria Using AF Weights Applied to 

AF and Simulation Samples \^ 



Performance 
Measure 


AF Weights 
Applied to AF 
Samole 


AF Weights 
Applied to . 
Simulation Sample 


Slmixlatlon 
Weights Applied to 
. Simulation Sample 


Total PT 
Time ^ 


.44*** 
(F=187) 


\ - (N-577 


.45* 
<N=57) 


✓ 

Average Appraisal 
Score 


^54*** 
(N=191) 


.49:*** 
(N=57) 


.54** 
(N=57) 


jr 

Score on 

Comprehensive X^st 


.54**5^ 
(N=191) ' 


1 .83*** 
1 ^ (N=57) 


.87*** 
(N=57) 


Composite 
Performance 


(Nil91) 


>, .63*** 
(N=57) 


' .63*** 
(N=57) 



p < .05 
p < .01 
*** p < .001 



so 



Generation of the Derived Effiott Scores 

The basic rationale in the derived ef for^measuije is that if one 
partials ability out of performance, the resulting residual should be 
highly saturated with effort variance. However, . the partialllng 
procedure can be carried out in basically two ways, tollowing either 



additive or* maltiplicative assumptions. The firSt derived effort, score 
assumBd that effort and ability summed to equal performance. , Coni^equent- 
ly, the derived effort score was calculated by obtaining the predicted 
scare for ^ subject on each of the four performance measures using the 
prediction equations previously generated. These, of course, were 
based solely on ability (lata, so the predicted score was essentially 
the level of performance predicted* for a given subject drt the basis of. 
his level 6f ability. This score was subtracted from his abtual 
perfopnance score, and this residual constituted the derived effort 
measi^re. This procedure resulted in eight derived effort scores for 
each subject. One for each of the four performance variables using the 
Air Force weights, and one for each of the four performance variables 
using the simulation .weights. 

The second type of derived effort score was based on a model which 
states that, performance is a function of ability multiplied by effort. 
In the previous additive model. Performance « Ability + Effort, thus 
Effort » Performance - Ability, Thus, subtracting predicted performance 
(ability) from performance is the derived effort score. However, in the 
multiplicative model. Performance = Ability X Effort, thus. Effort = 
Performance/Ability. The derived effort score is thus actual performance 
divided by predicted performance (ability) • 

Derived effort scores based on tills multiplicative model were 
calculated for each subject based on both the additive and multiplic- 
ative approaches. These were then compared to the hard criterion of ^ 
effort. i 

Evaluating the Hard Criterion bf Effort 

Before turning to relationship^ between derived effort and the hard 
criterion of effort, it is appropriate to discuss data pertinent to 
the evaluation of the hard" criterion of effort. 

Recall that the effort data came frqm ratings of 8 mm photographs 
of the subjects. A frame was taken every six seconds. For rating 
purposes, every third frame was utilized. The frame was projected oh a 
screen, and raters made a primary judgment as to whether the subject was 
working on'*^he task or not for that frame. Subjects were rated for all 
the time that they were not actually taking an appraisal. Thus, if a 
subject was not in his seat in the picture, and he was not taking an 
'appraisal, he was counted as not working for that frame. Also, since 
the material generally required eye-contact to work on, .a subject was 
scored as not worlclng if he was looking up from the work, or* talking 
to a co-worker. Subjects had been told to work on the materials 
Independently. This procedure resulted in 800-900 ratings per subject, 
per day. 

The Jiard criterion of effort was then calculated for each subject 
as the number of frames he was working on the material, divided by the 
maximum ntimber of $l:ames,he could have been working ' (i.e. appraisal time 
was removed).') ^ This "percent ap;e of time on task" ct)nstltuted the*hard. 
crlterlou of effort. , ^ 

In evaluating th6 adequacy of this measure, several criteria were 
employed. The first was whether it produced variability. In fact it 



■ A. ^ ■ ' ' ' ^ 

did . The mean> percent time on task was 79.7% with a standard deviation 
of 11.7. The range was from 48% to 96%. Cledrly, variability was 
obtained. ^ 

The second criterion was the inter-rater reliability of the effort 
ratings. To assess this, the two raters independently rat^d 100 frames 
for 10 subjects from eac)i of the three experimental conditions. The 
percent ^ime on task was calculated for each of the 30 subjects, once 
for rater A and once for rater B. The difference between the percent- , 
ages obtained by the two raters was then calculated and averaged across 
the 30 subjects. The mean difference in' percentages was 3%.' Thus, 
even with a fairly small sample of behavior, the ratings were highly 
reliably. 

The third evaluation of the effort measure dealt with cpnstruct 
validity. If the effort measure is indeed a good one it should 
correlate significantly with actual performance, but since piirformance 
is determined by factors other than effort the "coi^relation should be far 
from perfect. The effort measure correlated -.44 with a total PT time, 
.14 with average^ appraisal score, .39 with comprehofe^sive test score, 
and .49 with the composite. (Note that a negative l^orrelation with 
total PT time was expected since the greater the errort,-the less, time^ 
it should take t5 finish PTs). The magnitudes of the correlations of 
effort with the primatry performance variables, total PT tflmfe^and the 
composite, are in the expected range and thus add suppord to the 4 
validity of the effort fioeasure. ) ^ 

Additional .evidence of validity could be assessed^y comparing 
effort scores across the 6 PTs, and with the performance measures. We 
would predict that: (1) effor,t scores should correlate highly across 
PTs; (2) correlations between PTs should be higher than correlations 
of effort with performance. The average correlation between*ef fort 
scores across different PTs was .58. Thus, effort, scores are fairly 
highly correlated, and are correlated higher with each other than wi.th 
performance measures. 

Self ratings of effort were also obtained from the subjects when 
they had finished the six AF PTs* Two items were utilized. The first 
asked "oh" this job I am working: ... As hard as I possibly can. . . About 
average ... I ^ taking it easy." A nine-point Likert response forma^-" 
was utilized with verbal anchors at every other step*^ (See Appendix D 
for the actual items) . The second question, asked !^In terms of the 
total amount of efj^ort I could put in on this job, I am putting in 
about: ...10% effort ...5,0% effort... 90% effort." As before, a nine- 
point scale was used. The sum of the responses to these two items 
constituted the self rating of effort index. 

The central issue here is how well the hard criterion of effort was 
related tp the self ratings of effort. . The correlation between the 
hard criterion and the self rating was .21. While 4:his is statistically 
significant vK^^P^' "^^ "^^ quite small. • . \ 

In theory7"Fl^is could cast doubt on the validity of the hard 
criterion of effort. However, a more parsimonious explanation is that 
the self ratings were not particularly valid. The principal reason for 
this conclusion is that the self ratings actually correlated more 



25 



er|c * 2 a 



• •■ , ■ ' • ^ ■ ■ ■ - . • 

highly with total PT tiine (r » .27) than with effort. Appatently, the 
subjects were hot considering their actual level of isnerjgy expenditure 
when responding to the items. Furthermore, self Ratings of effort have 
been shown In a similar context (Pr it chard. Van Bergen, and DeLeo, 1974) 
to have very low convergent validity. 



Predictive Validity Of the Derived Effort Measure 

The primary method of evaluating the utility of the derived effort 
measure (sis to correlate the derived effort measure with the hard 
criterion of effort, table 5 presents these correlations for the derived 
effort scores based on each of th^^ four performance dimensions, for 
both the additive and multiplicative models, and for Air Force and 
simulation generated weights. .*< ' • 

The overall conclusion from these correlations is that thQ derived 
effort measure does not predict actual effort particularly well. For 
example, correlations based oh the composite ranged fjom -.02^o .32. 
The table also indicates that the scores based on the multiplicative 
iDodel did no better than those based on the addit^ive model « In three 
cases the multiplicative was better ^ in two cases the additive was 
bettet, and in three cases they were equal. Fu^cthermore, none of the 
differences was of appreciable magnitude. 

One clear finding is that when the derived effort scores are based 
on the regression equations calculated from the simulation data, they 
predict effort better than when the derived effort scores come from 



Table 5 

Correlations Between Derived Effort and Actual Effort (N«57) 





Air Force 


Weights 


simulation 


W.l^tB 


Derived Effort 


Additive , 


Multiplicative 


Additive 


Multiplicative 


Measure 


Model 


Model 


Model 


Model 


Derived Effort - 








< 

-,30* 


Total PT time 


-.09 


-.18 


-.29* 


Derived Effort - 








I -.04 


Average Appraisal 


-.27* 


-.27* 


-.03 


Score 










Derived Effort - 








I " 


Con^rehensive 


.23* . 


.24* 


.13 • 


; .13 

I' 


Score 


\ 






\ 


Derived Effort 


\ ... 








Composite 


; .16 


i -.02 


.32** 


\ .32** 



V 



.05 
.01 



the Independent Air Force san?>le. For exiample, for the additive model 
the cpmposlte derived effort scores calculated from Air Force weights 



ERIC 



26 



correlated .16 with effort, but this value increased to .32 when the ' 
simulation-based derived effort scores were used. This occurred in 
spite of the fact that those two derived Effort scores displayed a 
correlation between themselves of .98. One explanation for this 
pattern of results is based on the fact that the Equations in fact 
predict lower performance for a given subject than dp the simulation 
equations. This mean difference would not of course, affect this « 
correlation between the two predicted scores. Thus, the two equations 
rank order the subjects almost exactly the same, but the Ait F6rce 
prediction is lower. This, of course, would result in different derived 
effort scores since this predicted score is subtracted from actual 
performance. If, in fact, the regression line between' predicted and 
actual performance in the Air Force data was parallel to that in the 
simulation data, the two sets of derived effort scores would differ 
only by a constant, and thus be equally correlated with actual effort. 
Ho^fever, the two regression lines are not parallels One explanation for 
this could be that since there were more high/ability subjects in the 
simulation-J thi%may have influenced the ifegression line. TKis would 
imply that the jrelationship between ability and performance in non- 
linear. This will be disctiased in more detail later. 

Another way of assessing the utility of the derived ef fort.measure 
is to deal with it at a more grass level than the accuracy of Individual 
prediction. Recall that iii order to produce v/ariability in effort, 
three experimental conditions were employed. Hourly, Fixed Ratio, (FR; 
and Variable Ratio-Variable Amount (VRVA) pay systems. These three pay 
conditions did, in fact, produce variability in performance. The issue 
is how well the derived effbrt measures discriminated the three 
conditions. If the derived effort is useful,. the three conditions ^ 
should show even a greater difference in effort than they do in perfor- 
mance. This is the case since performance contains variance due to 
ability, but mean ability was constant across the three conditions. - 

Consequently, one way ANOVAS were calculated using the three 
conditions as levels of the factor and : (a) performance as the 
dependent variable, and (2) ddrived effort as the dependent variable 
Since the pay system only dealt with number of PTs ,passed rather than 
appraisal scores or score on the comprehensive test, only the total FT 
time variable was approprlarie for these analyses. , ' -» oo 

The resulting F- ratio for actual time to complete the PTs was 7.zy 
(p - .002). The F for the derived effort analysis using Air Force 
weights was 11.62 (p < .000), and using simulation weights was 11.89 
(p < .000). Since the order of the actual means was in the same 
direction for all three analyses, the larger F-ratios for the derived 
effort analyses indicate that using derived effort does in fact result 

^ ^^In e^^ning the scatterplotg relating deriv-d effort to actual 
effort it was observed that there were a number of subjects which 
eadiibited a specific pattern of scores. They were subjects of 
relatively low ability who finished the material quite quickly, but who 
spent a relatively small percentage of time actually working on the 
task One explanation for this pattern of scores was that these 
subjects were obtaining the answers to the appraisals from other people. 

• 27 



During the conduct of the simulation it became apparent that this was 
actually occuring to «time extent, ' o 

to deal with this potential problem; a "cheating index" was formed. 
It was based on the assumption that someone who finished the last of the 
six Air Force PTs substantially faster than they finist^ed the first PT, 
and who did poorly on the comprehensive test was probably getting the 
answers to the appraisals from someone else. The logic here was that it 
was unlikely that a^^ubject woulcLget the answers from someone else on 
the first PT, before Vthey were acTustomed to the situation. Thus, 
their time to complete the first PT could be vteWd as their actual 
level of performance. If, by the sixth PT they^ere getting the 
answers, they should be abl^ to complete it much faster. If, however, 
they did not actually know the material, this should show up as a low 
score on the comprehensive test t^ken after the last Air Jorce PT. 
Since no specific score was required to "pass" the comprehensive test 
there would be no preissure to pass answers for this. 'Furthermore, 
these tests were not scor'ed during the simulation, so a subject would * 
have no knowledge of what the correct answers were, and thus passing 
them to someone else was not possible. 

Thus, to construct this cheating index/ data from the relative time 
taken to complete PT 1 and PT 6, and the comprehensive scores were 
examined. The first step was to calculate the percent change in time 
to complete PT 1 and PT 6. , Examination of this distrubut ion indicated 
that there seemed to be a break In the distribution at at)out the 30% 
point. Thus, subjects who were more than 30% faster on PT 6 than on 
PT 1 were considered^otential^eaters. The comprehensive scores for 
these subjects were trien examined, and any of these subjects who 
received a score of less than 75% on tljle comprehensive was considered .a 
cheater.^ This procedure resulted' in the elimination of 10 subjects. . 
A second criterion was also employed. If a subject did very po9rly on 
the comprehensive test (less than 60%) and their percent increase in 
speed from PT 1 and PT 6 was positive, they were considered a cheater. 
This criterion elimii^ated one additional sublect. 

Thus, the procedure resulted in the elimination of 11 subjects. It 
is likely that some of these subjects were not, in fact, receiving 
answers but it was felt better to eliminate a non-cheater than retain a 
true cheater. Some evidence for the validity of the cheating index is 
available in that there wete foiir subjects who were known by the 
instructors to have been cheating^. All four were included in the 11 
subjects eliminated .by the cheating index. Furthermore , all Jput one 
of the 11 eliminated were in the conditions where pay was dependent pn 
number of. PTs passed. It was for these conditions where passing a 
large number of PTs was financially worthwhile. ^ 

These subjects were removed, and theTprincipal analy^es"^ repeated 
on the ^remaining 44 subjects. Regressions were calculated^ derived 
effort scores were computad, and correlations between derived*, effort 
and actual effort recomputed. Table 6 presents the results of t^ese 
analyses for the additiVe model derived effort scores, for both Air 



28 

32 



* • , . Table 6 

"Correlations Between Derived Effort and. Actual Effort with Cheaters 

Removed and Inisluded 

» ■ . . 

Correlation Between Derived 
Effort and Actual Effort 



Total Sample "(N'^SS) 



Derived Effort 
Measure 


AP Weights 


Simulation 
Weights 


AF Weights 


Simulation 
Welgtits 


Derived Effort - Total 
PT Time 


-•09 ' 


-.29* 


-.40** 


-^39** 


berived Effort - Average 
Appraisal Score 


-.27* 


-.03 


-:26* 


-.04 

r 


Derived Effort - 

Comprehensive Score 


.23* 


.13 


-.08 . 


-.02 


Derived Effort - 
Composite 


.16 


.32** 

J 

9 


.37** 


.35** 



Cheaters Excluded- (N«44) 



* p. < .05 
** p < ,01 



Force and simulation weights. The analogous correlations for the total 
sample (crffeaters included) is also repeated for comparison purposes. 
The table indicates that when the cheaters are excluded^ the 
predictive validity of the composite derived effort increases somewhat 
— from .32 to .37 using simulation weights. More importantly, the 
data suggest that when cheaters are removed, the composite derived^ 
effort based on Air Force weights predicts equally well as cdoes that 
•based on simulation weights. Thus, the difference in the predictability 
of effort found in the orginal analysis is probably due to the 
presence of cheaters in the simulation. 

The ANOVAs comparing actual performance (Total PT time) and the 
dferived effort scores across the three conditions were also repeated 
with-<iheaters excluded. The F-ratio for actual PT time was 3.86 
(p » r0i9), for derived effort (additive model) using AF weights it 
was 5.69 (p » .007), and when simulation weights were used in was 7.49 
(p « .002). As with the <romparable analysis with the total sample, 
using derived effort results in more precision. 

Although these results were more encouraging than the original 
analyses, the level of predictibility of effort vas still low. However, 
upon examining the scatterplots of derived effort with actual effort 
once the cheaters were removed, it was observed that subjects with low 
ability tended to be outliers from the main clustering of points. They 
generally tendea to have ^derived effort scores higher than their actual 
"effort. This suggested that actual level of ability might moderate 



29 



v. / 



33 



^the relatipnship between derived effort and aqtual effort. . - 

Consequently, the sample was split into high an'd low ability group?"" 
on the ba8:^8 of the sdLmuiation equation scored. Subject's whose 
predicted performance (ability) was above the median were considered 
high abiltiy, those below the median were considered low ability. 

Regression equations were then developed for ;the two groups 
separately* Due to the small sample size (N 22 per group) only two 
predictors were used in these eqiiations. These wei:e the two best 
predictors from the total sample. Derived effort scores were then 
calculated for -the subjects in each group baaed on the equations for 
that group. Derived effort scores were calculated for only the two 
^, major performance variables, total PT time and the composite. 

Results of these analyses are presented in Table 7. As the size 
6f the correlations indicates, ptedict^bility "is substantially 
increased when the sample is broken down by ability. This implies that 
somehow ability and effort are combiuin)g differently for high and low 
ability subjects^; 

, • Table 7 . ' . • . 

Correlations Between Derived and Actual Effort, 
by High and Low Ability/^ Subgroups 



Derived Effort 
Measures 




Correlatiojp Between Derived 
ancl Actual. Effort 






Low Ability 
(N-22) 


High Ability 
.(N.-22) 


Derived Effort 
PT Time 


- Total 


-.55**- • 


-.43** 




Derived Effort 


- Composite 


.54** 


'.44* 


< 



* p < .05 
** p < .01 



Summary of the Results 

1. The simulation subjects were roughly comparable to the Air Force 
subjects. The major difference was that there were more high 
ability subjects in the simulation sample, but both groups showed 
good variability. 

2. The ability test battery predicted performance quite well. The 
coiq)osite performance index was predicted ,59 in the Air Force 
sample, .63 in the simulation sample, and .63 when the Air Force 
weights were applied to the simulation data. The^degree of index 
shrinkage was quite low. 

3. The structure of the Air Force equations differed from the 
simulation equations, but they predicted simulation performance %' 



30 



equally well 9 and the predicted scores from the two equations 
were highly correlated. 



4. The hard criterion of effort displayed yide^ variability, excellent 
reliability and good construct validity. ^ 

5. When the entire sdmple was used, derived effort did not predict 
actual effort particularly well... ' 

6. Derived effort scores based on the mutLtiplicative model did not 
predict effort any better than derived effort scores based on the 

additive model. * • 

. ' 

7. Derived effort scores discriminated the three experimental 
conditions bett^er than actual performance. 

8. When cheaters were removed from the sample, prediction of effort 
increased somewhat. 

9. When separate analyses were conducted on high and low ability 
subQamples, prediction of effort was fairly good. The correlation 
fdr high ability subjects was .44, and .54 for low ability subjects. 

' • Disscussion and Conclusions 

I 

Given the results presented above, the major Issue now >ecoiae8 how 
well did the derived effort technique work, and under what clrcumatances 
could It be useful. It la to these Issues we now turn.) . 

It Is clear from the data that the derived effort measure produced 
statistically significant correlations Mtb actual effort In almost 
every casei However, the presence of statistical significance la not 
enough to Justify utilization of the technique. The actual magnitude 
of the telatlonshlp must be conaldered. j ^ j ^ e 

In considering this Issue, we shall Assume that the be^t estimate of 
the correlation between predicted and actual effort comes from the 
aamplft with t^ie "cheaters" removed. There Is good evidence that at 
least some of the subjects were In fact getting the answers elsewhere 
and, as such, they add artlflcal error Variance. With this In mind, 
our best estimate of the relationship between derived effort and actual 
effort la obtained by correlations between the composite derived effort 
measure and (ictual effort. These relationships were In the mid to 

upper 30'a. - . ' - 

A relationship of this magnitude Is not large enough to u^e for 
Individuals.' That is, the derived effort index contains too much 
error td be used for making decisions about a given' individual. ' 
However, it would be useful for group data. That is^ If one «*oup had 
a substantially .higher derived effort score than another, it would be 
fairly safe to conclude that the group with the higher mean derived 
effort score was exerting higher effort. We are essentially arguing? 



31 



that with a grot^p, the error associated with each individual's derived 
effort score shoud be" randomly distributed across the member^ of the 
group. As d consequence, the mean derived effort score across the group 
should reflect its average effort. 

The data are more encouraging when the analyses broken down by high 
and low ability are considered. In these analyses^ correlations between 
predicted and actual effort ranged from the mid .AO's to the xaid .SO's. 
Correlations of this size are approaching a levei of magnitude that 
might be useful for individual predictions. They ©till contain a large 
amount of error, and using the derived effort meaTsure with such 
relationships would have to be done with caution. However, relationships 
of thifi] magnitude would be extremely useftil^for groups. 

The real issue here is whether the larger relationships found when 
the sample was broken down into high and low ability would, in fact, 
replicate in another sample. On a post hoc basis it is not unreasonable 
to argue that they would. It seems quite possible that ability and 
effort do In fact combine somewhat differently for low and high ability 
trainees. It remains to' be seen, however, whether the finding would • 
replicate. 

In the introduction to this report, four Btajor advantages were 
claimed foif the derived effort index. It was argued that the technique; 
(1) would be an individualized procedure; (2) when used in an incentive 
system rewards could be given on the basis of effort rather than 
performance; (3) if used in an incentive system, trainees would have an 
equal chance to get incentives regardless ot their ability; and (4) 
would be a useful measure of effort with which to cQiiq)are different 
groups. We shall now discuss each of these potential advantages in 
the context of the results obtained from, this research. ^ 

(1) Individualized . The derived effort technique is indeed 
individualized. It utilizes data from the individual. His ability is 
considered, and the predicted score generated from his ability data is 
compared to his actual performance. However, whether a technique is 
individualized per se is really a matter of instructional philosophy. 
The technique must also have other utility if one is to argue for its 
use. ' 

■s 

(2) Reward can be based on effort . The data suggest that if the 
derived effort measure was used as an index of individual effort, and 
incentives were awarded on the basis of the derived effort scores for 
individuals, there is too much error in the system to enable one to say 
that rewards were in fact given on th6 basis of effort. Thus, the data 
Indicate that when used for individuals the technique will not give 
rewards based on effort. However, if groups are given incentives on the 
basis of mean derived effort scores, one could argue that rewards were 
based on effort. This argtament rests on the reasonable assumption that 
errors of predicting actual effort from derived effort would be random, (TJ 
and thus mean differences between groups reflect actual differences in 
efffSrt* This is especially true if scores are based on high.aiid low 
abdClity groups separately. 



32 



36 



(3) Trainees have an equal chance to get Incentives > Regardless ot 
whether the derived effort score Is a,£Ood measure of effort. It still 
has the definite advantage of tending to equalize the chances for earning 
incentives. Since high ability trainees have higher predicted scores' 
than low ability trainees, the higher ability people must perform at 

a higher level to obtain incentives. Thus, lower ability students should 
see that obtaining Incentives is more within their power than in the 
situation where actual raw performance is rewarded. Thus, the derived 
effort score is useful for those groups. 

(4) Useful for comparing different groups . Clearly, the derived effort 
technique would be useful for comparing effort in different groups. For 
example, if a new lnstructlonal|technlque was lnt!«:^du^ed, and one wished 
to compare the level of motlvadLon produced by the two techniques, the 
derived effort index would be an excellent way to do this. This is 
especially true when the actual level of ability of the samples exposed 
to the two techniques differed. 

More Importantly, the derived effort index gives one a common metric, 
with which to compare different courses. In many cases it would hp 
possible to directly compare different courses whose content, 
examinations, and formats were qixite different. We shall consider this 
in more detail below. ^ 

Suggestions for- Implementation 

Thus far we have been considering ways in which the derived effort 
index could be used in only a very general way. The conclusion is that 
ft is a very useful index for group effort, but probably not appropriate 
for individual assessment. Within this restriction, however, it has a 
number of very practical uses. The research was originally conducted 
with a view toward the Advanced Instructional System (AIS) , and these 
applications to be discussed apply directly to the AIS. However, many 
of them could be used in any training course. 

(1) Within a given course . A very useful application of the 
derived effort index wox^d be to compare different parts oF a given 
course. The issue is whether different parts (e.g., blocks) of course 
result in greater levels of motivation than- others. If this could be 
determined, it would be of great aid in redesigning courses. 'The 
procedure would basically Involve generating the predicted scores for 
total course performance and then breaking it down by block. Let us 
use time to coTiq)lete the course as the criterion of Interest for an 
example, although exam scores or any other criterion could be used. One 
i?ould first predict speed of completion of the entire course from the 
ability data. However, since the blocks vary in amount of material, 
the prediction of speed of coii5)letlon must consider this. Suppose, for 
example, there were three blocks. Based on available data trainees 
average 20 days on the first block, 30 days on the second, and 50 days 
. on the third. Thus, average time to coiiq)lete is 100 days. The first 
block represents 20% of total time? the second, 30%; and the third 50%. 
Thus, the predicted time to complete the course for a given subject 



33 

37 



would be proportioned according to these percentages to obtain a 
prediction of his time to complete each block. If a given subject 
received a predicted score of 110 days , we, would predict he would 
complete block 1 in 22 days (20% X 110), block 2 In ^33 days, and 
block 3 In 55 days* His actual time to complete each block would then 
be 3tjbtracted from his predicted time. 

Once this is dpne for all students, the mean derived effort by 
block could be calculated. If the three blocks result in equal 
^ motivation, the derived effort index should be near' zero for all blocks. 
The three derived effort means are substantially different, it would 
indicate that differences in motivation were present. v 

This procedure could be refined ^by doing separate analyses for-kLgh 
. and low ability students. The results would indicate whether high and 
low ablltiy students exhibited the same pattern, of motivation across 
blocks. ^ * : ' 

(2) Ch^ges in a course , * Another application of the derived 
effort index is where a change is mad^ in the str,ucture or procedures 
in. k given course. If students in the original format of the course 
have ability levels equal to those in the course after it has changed, 
simple perforjRance measurea^ould be used. However > as the ANOVAs in 
our data have indicated, ifsing the derived effort index gives a>ore 
precise test of effects since ability variation within conditions is 
controlled. The real advantage to the derived effort Indqx comes 
about when the actual ability of the two samples is not equal. ' Then, 
the derived effort index would be very useful. 

(3) Comparisons between different courses . As we have discussed 
above, the derived effort index could be very useful in comparisons 
between different courses. One procedure would be to simply calculate 
derived effott for the two courses and compare the two means. However, 
if the regression equations are based on the same sample upon which 
the derived effort scores are calculated, both me^ns should be zero. 
Huwever, when one or the other course^ is changed in some way, the 
derived effort index should be able to detect changes relative to 

the other coui^se. One could assess, for example^, whether a technique 
used in one course had an equal Impact on another course. 

Another procedure would be to go back to data obtained from students 
who had completed the course at a ^Iven time and develop the regression 
equations on that sample. If th^se equations were applied to another 
(e.g., more recent) sample, differences in mean derived effort would be 
more Interpretable. 

(4*) Feedback itoother application of the derived effort, index would 
^^be for feedback purposes. Instructors could be given the mean derived 
effort score for each section they taught and th^ change in this value 
from class to cla^$ could be"^ useful information* Students could also 
be given such feedback on a group basis, and" it could provide them 
information about their motivation. 

(5) Goal setting . One could also use the derived effort index on 
an individual basis in th^ context of goal settlrig. As we have argued 
above, the derived eff oft index for a single individual is not an 
accurate index of his own effort. However, if the individual's 



38 



predicted score was presented to him as a basis for godl settings It 
woiild at least have meaning In tha senBe that It represents the average 
'performance for people of comparable ability. An 6asy goal conld be to 
meet the predicted score ^ a hard goal could be to perform at a level 
one standard deviation above the predicted goal. In fact^ one could 
readily list a number of performance goals and indicate the objective 
probability (based on the development saiople) of obtaining that goal. 

Overall j then, the results of the present research Indicate that 
while the derived effort Index should probably not be used as a 
measure of effort for a single individual, the index can be considered 
a measure of group efBort. As such it has a nuni>er of very useful 
aplplications in both the AIS and other training contexts. 



35 

39 



References . 

Appelzweig, M.H., Moeller, G. , & Burdick, H. Multi-motive prediction of 
academic success. Psychology Reports , 1956, 2^, 489-496. 

Atkinson, J.W. An introduction to motivation . Princeton, N*^J^ : Van 
Nostrand, 1964. 

Barnes, R.M. Motion and time study . N^ew York: Wiley and Sons, 1940. 

Bitterman, M.E. likart rate and f requency--of blinking as indices of 
visual efficiency. Journal of Experiment^al Psychology , 1945, 35, 
279-292. 

Braunstein, D.H. , Bruanstein, H.M. , & BltjmfJeld, W.J. Performance- in 
training and an achievement effort rarCng. Psychology Reports , 
1965<T6, 1077-1080. 

Campbell, J.C., Dunnette, M.D. , Lawler, E.E., & Weick, K.E. Managerial 
Behavior,, performjance, and effectiveness , McGraw-Hill, 1970. 

Davis, R.C. Patterns of muscular act^ivity during mental "work ^ and their 
constancy. Journal of Experimental Psychology", 193SI, 24^, 451-465. 

Farquhar, W. H. A comprehensive study of the motivational factors 
underlying achievement of eleventh grade high school students . 
East Lansing: Michigan State University, 1963. 

• ^ 

Ford, F.A. The ratio of achievement to ability .as fotmd among fifth- 
grade pupils . Contribution to Education, No. 94, George Peabody 
College for Teachers, Nashville, Tenn. 1931. 

Furst, E.J. Validity of some objective scales of motivation for 
predicting academic achievement. Education and Psychological 
Measurement , 1966, 26(4), 927-933. 

Hackman, R. , & Porter, L.W. Expectancy theory predictions of work 
effectiveness. Organizational Beha\'lor and Human Perfon!ance > 
1968, 3, 417-426. ^ ^ W 

Haven, S.E. The relative effort of children of native versus foreigi||/i( 
bom parents. Journal of Educational Psychology , 1931, 22^, 523-5367^ 

Lawler, E.E. Pay and managerial effectiveness: A Psychological view . 
New York: McGraw-Hill, 1971. 

Lawler, E.E., &.Pprter, L.W. Antecedent attitudes of effective 
'managerial performance. Organizational Behavior and Human 
Performance, 1967, 2, 122-142. . u 



36 

ERiC : ^ AO 



Luekel, F. Introduction to psychological psychglbgy . Saint Louis: 
• C.V. Mosby, 1968. - 



Lovekin, O.S. ^fee quantitative measurement of human efficiency liiider 
. factor conditions. Journal of Industrial Hygiene, and Toxicology , 
/ 1930, 12, 99-120. 

Luchins, A.J. , & Luchins, E.H,. The Einstellmg phenomenon and effort- 
fulness of task. Journal of General Psychology , 1954, 50, 15-27. 

Mayo, R.J. The conceptual representation and measurement of humjgn 
effort . Unpublished paper, Purdue University, 1974. 

Mayo, G.D., & Manning, W.H. Motivation measurement. Education and 
" Psychological Measurement , 1961, 21 (1^),* 73-83. 

McCall, W.A. How to Experiment in Education . New Y'^rk: Macmillian, 
1930. 

McClelland , D . C . St udies in Motivation . New York : Appleton-Century- 
Croft, Inc. , 1955. 

Mitchell, V.F. The relationship of effort, abilities, and role 
" perceptions to managerial performance . Unpublished doctoral 
dissertation. University of California, Berkeley, 196&. 

New Yorl^ Institute of Technology, A programmed text in basis electricity 
New York: McGraw-Hill, 1963. 

^New York Institute of Technology, A progrananed text in basic electronics 
New York: McGraw-Hill, 1964. 

Pinter, R. Twenty first yearbqok , 1930. 

Porter, L..W* & Lawler, E.E. Managerial attitudes and performance. 
Homewood, 111.: Irwin Dorsey, 1968. 

P res grave, R. The dynamics of time study . Toronto: University of 
Toronto Press. 1954. 

Pritchard, R.D. , Leonard, D.W., Von Bergen, C.W. Jr., & Kirk, R.J. 

The effects of varying schedules of^ incentive delivery on tetchnical 
training AFHRL-TR-74-32 , Air Force Human Resources Laboratory, 
Lowry Air Force Base, ODlolrado, (1974). AD- A001117 



Pritchard, R.D., Von Bergen, C.W. Jr. & DeLeo, P.J. Incentive 

motivation techniques evaluation in Air Force Technical Training 
AFHRL-TR-74-24, Air Force Human Ripsources Laboratory, Lowry 
Air Force Base, Colorado, (1974).. AD-A005302. 



37 • ^ 



Ramby, S,M. The evaluation of an indirect technique for assessing 
effort . Uhpublsihed M.S. thesis, Purdue University, 1973. 



Ryan, T.A. Work and effort s New York: Ronald Press, 1974. , 

Schneider, B,, & Olsen, L.K. Effort as a correlate of organizational » 
reward system and individual values. Personnel Psychology ,^ 1970, 
23, 313-326. 

Scott, S.Gi Auditory flutter fussion as a measure of efforts 
Dissertation Abstracts International , 1960, 21^, 1926. ^ 

Solomon, R.L. Time and effort factors in the avoidance of repetition of 
responses. American Psychologist > 1946, 1> 291-292. 

Strauss, G. , & Sayles, L.R. Personnel; The Human Problems of 
Management . New York: Prentice-Hall , 1960. V * 

Thomdike, E^L. Educational Wychblogy; Volume I, the original nature 
of man l New Yorks Columbia University Press, 1913. 

Tsao, F. Is A^ or F score the last word in detend.ning individual , 
effort? Journal of Educational Psychology , 1943, 24 (9^), 513-525. 

Vroom, V. Work and motivation . New York; Wiley ,^9 64. * 

' Yacorzynski , O.K. * Degrees of effort II: quality of work and time of 

completion of performance tests. Journal of Experimental Psychology 
1942, 30, 342-344. 



38 



ERIC . 42 



APPENDIX A 
Advertisement U^ed to Recruit Subjects 



39 



ErJc , 43 




We are looking for 20 people between 17 - 19 years of age 



to work for one week on a job evaluating written training materials* 

Nd special skills are required, The pay will be approximately 

$2*00 per hour^ depending on what you do. The work day will be 

■ ■ ■ . ' ,/ ' 

* from 8:30 A,M, .to 1:30 P.M., Monday through Friday, June 17 - ^1. 

If you would like a week^s work, report at 8:30 A^M,, Monday, 

June 17 p.z the conference room in the Holiday Inn, U,S, 24 East, 

Lo^ansport, Indiana. , 



ERIC 



44 

40 



APPENDIX B 
Pre-employmen^ Electricity Test 



ERIC 



41 



45 



1. What type of aircraft is a KC-^135A? 

2. What ia galvanic corrosion? 

.Define thei following terms and symbols: 
3. ampere 

4 » static discharger 
5. magnetic premeabilit;^ 
E 

8. buttock lines 

9. multimeter 



42 

46 



• \ 

• / 



A 



APPENDIX C 
SidD]^^ Api^ralsal 



43 

ERIC 47 
\ . ^ 



DEPARTMENT OF THE AIR FORCE AS/Q/3ABR42330-107 
HQ USAF School of Applied Areospace Sciences (ATC) 5 December 1973 
Chanute Air Force Base^^ Illinois 61868 Page 1 

APPRAISAL* 

COURSE ^ Aircraft Electrical Repairman 

^SUBJECT: DC Generation and Basic Circuit Symbols and Terms 

INSTRUCTIONS: Fbllow .the directions given in each section. 

« 

Section I 

OBJECTIVE: Given the names of electrical components, identify each 
consonant that belongs to one of the following categories: a» source 
of EMF, b, protective devices, c, control devices, d, load devices, and 
e. conductors. A minimum of 80% accuracy is required. 

Match the terms on the right to the componets on th^ left. Place the 
letter of the term in the blank provided by the compbnent. The terms 
may be used more than once. 





COMPONENTS 




TERMS 


1. 


Fuse 


a. 


Conductor 


2. 


Motor 


b. 


Load Unit 


3. 


hamp 


c. 


Source of EMF 


4. 


Circuit Breaker 


d. 


Control Device 


5. 


Generator 


e. 


Protective Device 


6. 


Thermocouple 






7. 


Resistor 






8. 


Alrciraft Structure 






9. 


Battecy 






10. 


Switch 







Section II 



OBJECTIVE: Given a list of electrical symbols and a list of units and 
terms, match the symbols with their respective \m±t or term. A 
minimum ois^.80^ accuracy is required. 



^^^^he above appraisal has been discontinued. 

48 



44 



ERIC 



» . AS/Q/3ABR42330^107 

' . ,Page 2 

Match the terms with the symbols below. Place your answer in the blank 



provided by the term. 



1 . Fuse 



TERMS 

/ * 

6. Circuit Breaker 



11. Voltage 



2* Lasxp 



7. Fixed Resistor 



12. Current 



3. Battery 



8. Variable Resistor 



13. Amperes 



4* Ammeter 



9. Thetmocouple 



14 • Ohms 



5. Generator 



10. Resistance 



15. Volts 



d. ^"Xi 

e 

f.-).l.\v 



SYMBOLS 

g. -(c> 




i _(a>- 

Section III 

J- 



j- 
k. 

1. 
m. 



R 
a 
E 
V 



o. 



OBJECTIVE: (ftven a list of definitions^ and a list of DC generation 
terms match the definition with their proper terms. A minimum of 
80% accuracy Is required. 

Mfltch the definitions on the right to the terms on the left ^ and 
record your answers in the blanks provided by the terms 



JL. 
2. 



3, 



4. 



5* 



TERMS 
Generator 
Battery 
Thetrmocouple 
Mechanical Method 



Heat or Thermal 
Method 



DEFINITIONS 

Voltage produced when heat is 
applied to two dissimilar metals 
^hat are Jointed at one end. 

Device using mechanical energy to 
produce an EMF 

Device using heat to produce an EMF 

Device using the chemical method to 
produce an EMF 

Voltage produced by relative angular 
motion between cohductors and a 
•magnetic field 



45 



49 



1 



APPENDIX D . 
Effort Questionnaire 



ERIC 



50 

46 



PART II 

Circle the number that best describes your feelings. Circlei any 
number. If you feel you are, for example, between the statement In 
number 7 and the stateinent in number 5, circle number 6, 

1. On this job I am working 

9. As hard as I possibly can 
8. • 

7. Fairly hard, but not killing myself 
. 6. 

5. About average 

3. Not very hard 
2. 

1. I am taking it easy 

2. In terms of the total amount of effort I could put In on this 
job, i am putting in about: 



3 



1. 


10% 


2. 


20% 


3. 


30% 


^. 


40% 


5. 


50% 


6. 


60% 


7. 


70% 


8. 


80% 


9. 


90% 



51 



i^aS. GOVERHMEHT PWHTIHO OFFICE: 1975- e/X-GOiJ/BO^O 47 



